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Abstract 

Background: Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and 
deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate 
gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of 
molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical 
phenotypes. 

Results: In this work, we proposed a computational method to reconstruct ossociotion modules containing driver 
aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from 
drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell 
lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal 
segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the 
expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the 
regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with 
the driver/regulator genes. Moreover, 6 of 1 1 predicted MYB targets were down -regulated in an MYB-siRNA treated 
leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions. 

Conclusions: The results provide rich mechanistic information regarding molecular aberrations and gene 
expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis 
and treatment of cancer in the era of personalized medicine. 



Background 

Cancer cells harbor a large number of alterations at 
genetic, epigenetic and phenotypic levels. High-through- 
put screenings identified hundreds of somatic mutations 
(e.g., [1,2]), thousands of gene and microRNA expres- 
sion changes (e.g., [3,4]) and copy number variations (e. 
g., [5]), large-scale epigenetic variations (e.g., [6]) in a 
typical cancer cell. Only a small fraction of these altera- 
tions may drive the malignancy of cancers, whereas the 
majority of them are likely the by-products of chromatin 
instability and dysregulation of transcriptional/transla- 
tional apparatus. Separation of driver from passenger 
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aberrations is a key question of cancer genomics for its 
strong implications in prognosis and treatments. 

Finding the causal and mechanistic links connecting 
driver aberrations and clinical phenotypes is challenging 
due to the complexity of the underlying processes. 
Alternatively, molecular phenotypes such as gene 
expressions are considered. Genetic and epigenetic 
alterations on DNAs modulate the expressions and 
activities of key regulators such as transcription factors 
and microRNAs. Dysregulation of these molecules in 
turn affects a large number of downstream genes. The 
global gene expression and activity changes then influ- 
ence clinical phenotypes such as proliferation rates, drug 
resistance and capability of metastasis. 

Within the context of mechanisms we modify and 
extend the definitions of drivers and passengers. A dri- 
ver is a molecular aberration (mutation, copy number 
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variation, DNA methylation, etc.) that causes variations 
of gene expressions. A passenger is a protein-coding 
gene or microRNA whose expression is modulated by 
the driver molecular aberration. Notice our definitions 
of drivers and passengers specify mechanistic/causal 
relations of genes but do not require information about 
clinical phenotypes. We term the conventional defini- 
tions of drivers and passengers as phenotypic and the 
extended definitions as mechanistic. A phenotypic driver 
is usually also a mechanistic driver, since it may affect 
clinical phenotypes by altering the regulatory programs 
in cancer cells. Conversely, a mechanistic driver may 
not be a phenotypic driver if the altered passenger genes 
do not pertain to the clinical phenotypes. Similarly, a 
mechanistic passenger can be a phenotypic driver if the 
expression change of the gene affects the cancer 
phenotype. 

Comprehensive characterization of diverse molecular 
aberrations in tumors is a major trend of cancer research 
in the post-genomic era. Several international consortia 
and research institutions have launched large-scale pro- 
jects to catalog the genomic, transcriptomic and epigenetic 
changes across multiple tumor types. Two noted endea- 
vors are The Cancer Genome Atlas (TCGA) [7] and the 
International Cancer Genome Consortium (ICGC) [8]. 
Beyond these coordinated efforts, comprehensive assays 
on the NCI-60 cell lines have been performed by distinct 
research groups over the last two decades (e.g., [9-19]). 

As large-scale, comprehensive cancer genomic data 
become more abundant, it is essential to develop com- 
putational tools to integrate heterogeneous data in order 
to acquire a systematic understanding of cancer cells. A 
rich collection of previous studies attempted to identify 
the driver mutations responsible for tumorigenesis from 
cancer genomic data (e.g., [2,20]). These studies 
hypothesized that the driver mutations conferred selec- 
tive advantages for tumor growth thus occurred in mul- 
tiple independent instances. Accordingly they aimed for 
finding recurrent aberrations with high frequencies. 
Beyond single genes a variety of computational tools 
have been proposed to examine the abnormal pathway 
activities by combining the molecular information of 
their constituent genes, such as the gene set enrichment 
analysis [21], principal component analysis [22], factor 
graph models [23], and others (e.g., [24]). Causal and 
mechanistic relations of molecular aberrations and gene 
expressions were not addressed in these studies. Some 
studies tracked the causes of abnormal gene expressions 
by correlating them with DNA copy numbers, gene 
mutations, DNA methylations or microRNA expressions 
(e.g., [7,25-27,4]). However, these studies were often 
restricted to pairwise comparisons between two types of 
data and lacked a unifying framework to integrate multi- 
ple types of data in the same model. 



Beyond cancer data analysis many generic computa- 
tional models of data integration have been proposed. 
Causal relations were constrained by a variety of proper- 
ties such as conditional independence of observed vari- 
ables (e.g., [28,29]), molecular interactions and pathways 
(e.g., [30]), and data generated from intervention experi- 
ments (e.g., [31]). While many of these tools have been 
applied to cancer data, most of the questions of interest 
are phenotype- driven (e.g., finding genes responsible for 
a clinical trait) rather than mechanism-driven (e.g., find- 
ing the causal relations connecting driver aberrations 
and passenger gene expressions). 

Akavia et al. inferred mechanistic relations of gene 
expressions and phenotypes in melanoma [32]. They 
identified the gene copy number changes that modu- 
lated their own expressions and indirectly affected the 
expressions of their target genes. Curiously, these 
mechanistic drivers were also phenotypic drivers, as 
their knock-downs abrogated tumor cell proliferation. 
Despite the value of connecting mechanistic and pheno- 
typic characterization, this study was restricted to copy 
number variations and discarded other types of molecu- 
lar aberrations. 

Recently, we proposed a layered modeling framework 
for integrative analysis of cancer genomic data [33]. The 
goal was to explain gene expressions with observed 
molecular aberrations. Associations with molecular aber- 
rations were incrementally included according to levels 
of uncertainty and mechanistic information. In this 
work, we extended and modified the layered modeling 
framework to identify association modules of driver 
molecular aberrations and passenger gene/mircroRNA 
expressions from the integrated datasets of NCI-60 cell 
lines. For each type of molecular aberrations, we found 
the downstream passengers putatively affected by the 
drivers and regulators mediating the effects from drivers 
to passengers. The causal relations between drivers/reg- 
ulators and passengers were supported by both experi- 
mental and in-silico validations. The analysis results 
justify the utility of association modules for other cancer 
genomic data. 

Results 

Construction of association modules from integrated 
datasets 

Diverse molecular aberrations can modulate variations 
of gene expressions in cancer cells. The dependencies 
between molecular aberrations and gene expressions 
may be modular, as multiple genes with coherent 
expression profiles are likely to be affected by common 
molecular aberrations. The goal of this study is to find 
the modules of genes or microRNAs whose coherent 
expression profiles are possibly driven by common 
molecular aberrations. We define an association module 
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as a tuple consisting of three components: (l)observed 
driver molecular aberrations, (2)passen-ger genes or 
microRNAs whose expression profiles are associated 
with driver molecular aberrations, (3)regulators (tran- 
scription factors) that mediate the effects between dri- 
vers and passengers. We consider the following types of 
association modules: 

1. Cis-acting effects with copy number variations of 
chromosomal segments. The copy number variation 
(CNV) of a chromosomal segment is associated with 
the expressions of its constituent genes or 
microRNAs. 

2. Trans-acting effects with copy number variations of 
chromosomal segments, A chromosomal segment CNV 
possesses cis-acting effects with intermediate regula- 
tors on the segment, and both segment CNV and reg- 
ulator expressions are associated with the expressions 
of genes or microRNAs on other segments. 



3. Effects with gene mutations. The mutational state 
of a gene is associated with the expressions of itself 
and other genes or microRNAs. 

4. Effects with DNA methylations. The coherent 
DNA methylation state of a collection of genes is 
negatively associated with the expressions of genes 
or microRNAs. 

5. Regulatory effects with microRNAs. The coherent 
expressions of a collection of microRNAs are nega- 
tively associated with the expressions of a collection 
of genes. 

6. Regulatory effects with transcription factors. The 
coherent expressions of a collection of transcription 
factors are associated with the expressions of a col- 
lection of other genes. 

The association modules are visually summarized in 
Figure 1 and elaborated in Materials and Methods and 
Additional file 1, Text SI. 
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Figure 1 Types of association modules. From top-left to bottom-right: cis-acting effects with copy number variations of chromosomal 
segments, trans-acting effects with copy number variations of chromosomal segments, associations with gene mutations, associations with DNA 
methylations, regulatory effects with microRNAs, regulatory effects with transcription factors. Black solid lines: information flows from central 
dogma. Green dashed lines: regulatory links from transcription factors or microRNAs to their targets on other chromosomal locations. Red dotted 
lines: associations between observed aberrations and mRNA/microRNA expressions. Arrows indicate positive associations and bar-ends indicate 
negative associations. 
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Associations between driver aberrations and passenger 
expressions are established by a series of hypothesis test- 
ing procedures using logistic regression models. In brief, 
an association between a driver and a passenger is 
selected only if it cannot be replaced by any other asso- 
ciation (with the same passenger) without a significant 
loss of the explanatory power. Procedures of construct- 
ing association modules are described in Materials and 
Methods and Additional file 1, Text SI. 

Diverse molecular aberrations drive mRNA responses in 
NCI-60 cell lines 

We identified association modules from integrated data- 
sets of NCI-60 cell lines, including mRNA and micro- 
RNA expressions, copy number variations, mutations, 
and DNA methylations (see Materials and Methods for 
the data sources and processing). 6888 genes possessed 
coherent expression profiles among three mRNA data- 
sets. 48.16% (3317 of 6888) of the valid mRNA expres- 
sion profiles were explained by observed molecular 
aberrations. Intriguingly, each type of association mod- 
ules in Figure 1 were detected, suggesting diverse mole- 
cular mechanisms drive mRNA expressions. 
Summary of mRNA association modules 
Table 1 shows the summary information of association 
modules for mRNA expressions. Associations with seg- 
ment CNVs dominate the inferred modules. 1699 of 
3317 passenger expressions explained by observed aber- 
rations are associated with CNVs on their own seg- 
ments, and 43 of 84 association modules are cis-acting 
effects with segment CNVs. Trans-acting effects with 
segment CNVs comprise 18 association modules and 
506 passenger genes. 522 passenger expressions and 14 
association modules are positively or negatively asso- 
ciated with mutations of 8 oncogenes and tumor sup- 
pressors. 161 passenger expressions and 5 association 
modules are negatively associated with DNA methyla- 
tions. Two association modules with microRNA expres- 
sions comprise 121 and 120 passenger genes 
respectively. Moreover, two association modules with 
transcription factor expressions comprise 338 and 338 
passenger genes respectively. The complete list of asso- 
ciation modules are reported in Additional file 2, Table 
SI, and driver aberrations and passenger mRNA expres- 
sions are visualized in Additional file 1, Figures SI, S2, 
S3, S4, S5, S6, S7, S8, S9, S10, Sll, S12 and S13. The 
complete list of microRNA association modules are 
reported in Additional file 3, Table S2, and driver aber- 
rations and passenger microRNA expressions are visua- 
lized in Additional file 1, Figures S14, S15, S16. 

The credibility of associations between multiple vari- 
ables was assessed by false discovery rates (FDR, 



[34,35]). We adopted the permutation tests described in 
[36] as the null model and evaluated two types of FDRs: 

expected # false positives according to the null model # false positives in the 99 percentile of the null model 
#positive calls from the date # positive calls from the date 

Table 2 shows the FDRs for each type of associations 
and for all associations together. The FDRs for cis and 
trans acting effects with segment CNVs are substantially 
smaller than those for other types of associations. This 
is sensible as the cis-acting effects with segment CNVs 
are constrained by chromosomal locations of passenger 
genes, and the trans-acting effects with segment CNVs 
are constrained by coherent segment CNVs and expres- 
sions of their regulators. In contrast, other types of asso- 
ciations possess no additional constraints beyond driver 
aberrations and passenger expressions, thus are more 
likely to be spurious. Furthermore, the FDRs calculated 
from the expected number of false positives are lower 
than those calculated from the 99 percentile of the null 
distribution. This is also sensible since the latter gives a 
much more conservative estimate of the false positive 
numbers. The overall FDRs calculated by these two 
methods are 0.235 and 0.326 respectively. 

To justify the biological meanings of association mod- 
ules, we conducted several in-silico validations based on 
prior literature. First, we extracted putative targets of 
transcription factors from the TRANSFAC database [37] 
and those of microRNAs from three databases ([38-40]). 
For each association module, we then evaluated the 
enrichment p-values for the putative targets of their reg- 
ulators (transcription factors or microRNAs) in the pas- 
senger genes. Table 3 reports the enrichment p-values 
for the putative targets of regulators in the association 
modules. Among the 14 modules whose regulators pos- 
sess putative target information, 9 of them are signifi- 
cantly enriched (p-value < 0.05) with putative targets in 
their passenger genes. All but one modules containing 
more than 60 passenger genes are enriched with puta- 
tive targets of their regulators. Only one large module - 
positive associations with APC mutations - is not 
enriched with the putative targets of the regulator since 
APC is not a transcription factor and has no binding 
motifs. 

Second, we incurred a batch search for all pairs of 
drivers/regulators and passengers of association mod- 
ules on NCBI PubMed database and checked whether 
they were co-cited in previous publications. Table 4 
summarizes the information of co-cited (driver/regula- 
tor, passenger) pairs for each association module. In 
total, 449 (driver/regulator, passenger) pairs co-occur 
in the same publications, and 22 of 46 association 
modules with non-local aberrations have co-cited (dri- 
ver/regulator, passenger) pairs. To assess the signifi- 
cance of co-citations we randomly sampled passenger 
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Table 1 Summary information of association modules for mRIMA expressions 
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+ 


23 


80 


methylation 


COL5A1 


NA 




10 


38 


intra 


CNV 


seg 27 


(chr 


5q) 


NA 


+ 


22 


81 


microRNA 


mir group 1 


NA 




121 


39 


intra 


CNV 


seg 39 


(chr 


9p) 


NA 


+ 


22 


82 


microRNA 


mir group 2 


NA 




120 


40 


intra 


CNV 


seg 12 


(chr 


2q) 


NA 


+ 


20 


83 


TF 


TF group 1 


NA 


+ 


338 


41 


intra 


CNV 


seg 74 


(chr 


19q) 


NA 


+ 


20 


84 


TF 


TF group 2 


NA 


+ 


338 


42 


intra 


CNV 


seg 83 


(chr 


22q) 


NA 


+ 


20 














43 


intra 


CNV 


seg 85 


(chr 


23pq) 


NA 


+ 


20 















Index: module index. Reg: intermediate regulators. Sign: the functional direction (activation or repression) from drivers to passengers. N: number of passenger 
genes. Intra CNV: intra-segment CNV. Inter CNV: inter-segment CNV. TF: transcription factor expressions. The locations of segments are reported in Supplementary 
Table S5. Mir group 1 consists of mir-92,mir-96,mir-106a,mir 20b,mir-17,mir-19b,mir-32,mir-135,mir-25,mir-106b,mir-93,mir-106,mir-18,mir-20. Mir group 2 consists 
of mir-24,mir-99b,mir-27b,mir-21,mir-125a,mir-23b,mir-27a,mir-23a. TF group 1 consists of SMAD3,FOXD1,PLAU,BDNF,FOSL2,FOSL1,RBMS1. TF group 2 consists of 
ERG,ELK4,NFATC1,RFX3,POU5F1,ZNF350. 



Li et al. BMC Systems Biology 201 1, 5:186 
http://www.biomedcentral.eom/1 752-0509/5/1 86 



Page 6 of 22 



Table 2 False discovery rates of mRIMA and microRNA 
association modules 



mRNA modules: 






FDR evaluation 


driver type 


FDR 


mean 


intra CNV 


0.0259 


mean 


inter CNV 


0.0035 


mean 


mutation 


0.3467 


mean 


methylation 


0.4334 


mean 


microRNA 


0.2248 


mean 


TF 


0.2197 


mean 


all 


0.2353 


99% 


intra CNV 


0.0855 


99% 


inter CNV 


0.0899 


99% 


mutation 


0.4949 


99% 


methylation 


0.6322 


99% 


microRNA 


0.3109 


99% 


TF 


0.2815 


99% 


all 


0.3262 


microRNA modules: 






FDR evaluation 


driver type 


FDR 


mean 


intra CNV 


NA 


mean 


inter CNV 


0.0021 


mean 


mutation 


0.2535 


mean 


methylation 


0.4403 


mean 


microRNA 


NA 


mean 


TF 


0.2857 


mean 


all 


0.2924 


99% 


intra CNV 


NA 


99% 


inter CNV 


0.0455 


99% 


mutation 


0.3645 


99% 


methylation 


0.6379 


99% 


microRNA 


NA 


99% 


TF 


0.3600 


99% 


all 


0.3943 



Mean: using the expected number of false positives to evaluate FDR. 99%: 
using the 99 percentile of the number of false positives to evaluate FDR. Intra 
CNV: intra-segment CNV. Inter CNV: inter-segment CNV. TF: transcription factor 
expressions. All: the FDR over all associations. 



genes of each module and counted the maximum 
number of co-cited pairs over 10 random trials. 
Among the 23 association modules possessing co-cited 
pairs, 7 of them have more than 1.4 fold of the num- 
ber of co-cited pairs compared to random samples 
(Table 4). The complete co-cited (driver/regulator, pas- 
senger) pairs among all association modules are 
reported in Additional file 4, Table S3. 

Third, for the passenger genes in each association 
module we evaluated the enrichment p-values for func- 
tional categories [41] and curated pathways ([42-44]). 
Table 5 reports representative GO categories and path- 
ways. 38 of the 84 association modules have at least one 
enriched GO category/pathway with p-value < 0.001. 



Table 3 Enrichment of driver/regulator binding motifs on 
passenger promoters of mRNA association modules 

index reg A/ n N 2 N 3 p-value 

44 MITF 118 2023 23 0.0464 

45 TBP 77 8009 56 5.3704 x 10" 4 

45 MYB 77 10684 65 0.0076 

46 SREBF2 49 9755 36 0.1581 

47 E2F3 31 8123 21 0.0989 
47 NFYA 31 2170 3 0.8513 
47 SRF 31 75 0 1.0 
51 NFATC3 22 2170 5 0.2097 
53 NFKB2 19 531 0 1.0 
66 TP53 35 3576 16 0.0041 
76 PAX8 61 5376 34 0.0014 

78 HOXC13 19 5421 12 0.0164 

79 PPARG 13 20 0 1.0 
81 mir-96 121 636 11 0.0147 
81 mir-106a 121 319 6 0.0463 
81 mir-17 121 830 12 0.0379 
81 mir-93 121 832 12 0.0385 

81 mir-106 121 718 11 0.0323 

82 mir-21 120 291 7 0.0093 

83 SMAD3 338 10025 271 1.058 x10" 7 

84 ERG 338 2231 67 0.0093 
84 ELK4 338 2803 76 0.0518 
84 NFATC1 338 10187 265 3.1517 x 1 0" 5 
84 RFX3 338 2751 75 0.0481 
84 POU5F 338 379 18 0.0027 

Index: module index. Reg: the transcription factor or microRNA with putative 
targets. Ny. number of passenger genes in a module. N 2 : total number of 
putative targets in human genes. N 3 : number of passenger genes containing 
the transcription factor binding motif, p-value: hyper-geometric p-value of 
enrichment. For modules with microRNAs as drivers, N 2 : number of putative 
targets of a microRNA. N 3 : number of passenger genes belonging to the 
putative targets of the microRNA. 

The complete list of enriched GO categories and path- 
ways are reported in Additional file 5, Table S4. 
Genes of cis-acting effects with segment CNVs exhibit a 
clustering tendency 

We partitioned the CGH data into 86 coherent seg- 
ments (Additional file 1, Text SI) and examined positive 
associations between segment CNVs and the mRNA 
expressions of their constituent genes. The cis-acting 
effects with segment CNVs constitute a substantial por- 
tion of the associations with mRNA expressions: 1699 of 
6888 mRNA expressions can be explained by the local 
segment CNVs. These associations are grouped by seg- 
ments into 43 modules. The members of association 
modules for local segment CNVs are reported in Addi- 
tional file 2, Table SI, and information about partitioned 
segments is reported in Additional file 6, Table S5. 

The passenger genes associated with local segment 
CNVs are not evenly distributed on chromosomes. Fig- 
ure 2 marks the locations of these passenger genes (blue 



MITF 


1 18 


2023 


23 


0.0464 


TBP 


77 


8009 


56 


5.3704 X 10 


MYB 


77 


10684 


65 


0.0076 


SREBF2 


49 


9755 


36 


0.1581 


E2F3 


31 


8123 


21 


r\ r\r\or\ 

0.0989 


NFYA 


31 


21 70 


3 


0.8513 


SRF 


31 


75 


0 


1 .0 


NFATC3 


22 


21 70 


5 


0.2097 


NFKB2 


19 


531 


0 


1 .0 


TP53 


35 


3576 


1 6 


0.0041 


PAX8 


61 


5376 


34 


0.0014 


HOXC13 


19 


5421 


1 2 


0.0164 


PPARG 


1 3 


20 


0 


1 .0 


mir-96 


121 


636 


1 1 


0.0147 


mir-106a 


121 


319 


6 


0.0463 


mir-17 


121 


830 


12 


0.0379 


mir-93 


121 


832 


12 


0.0385 


mir-106 


121 


718 


11 


0.0323 


mir-21 


120 


291 


7 


0.0093 


SMAD3 


338 


10025 


271 


1.058 x 10~ 7 


ERG 


338 


2231 


67 


0.0093 


ELK4 


338 


2803 


76 


0.0518 


NFATC1 


338 


10187 


265 


3.1517 x 10" 


RFX3 


338 


2751 


75 


0.0481 


POU5F 


338 


379 


18 


0.0027 



Li et al. BMC Systems Biology 201 1, 5:186 
http://www.biomedcentral.eom/1 752-0509/5/1 86 



Page 7 of 22 



Table 4 Summary of co-citations in mRNA and microRNA 
association modules 

mRNA modules 

index drivers/regulators A/ n N 2 N 3 

44 ZIC1,MITF seg 18 CNV (+) 118 14 6 

45 TBP,MYB,HDAC2 seg 31 CNV (+) 77 28 20 

46 SREBF2,EP300 seg 84 CNV (+) 49 3 2 

47 E2F3,NFYA,SRF seg 30 CNV (+) 315 4 

48 TCF7L2 seg 42 CNV (+) 30 1 3 
54 UBP1 seg 16 CNV (+) 18 10 

62 APC mutation (+) 136 17 10 

63 APC mutation (-) 56 3 6 

65 CDNK2A mutation (-) 43 3 6 

66 TP53 mutation (-) 35 11 3 

67 BRAF mutation (-) 34 3 3 

68 TP53 mutation (+) 33 4 6 

69 CDKN2A mutation (+) 32 6 3 

71 PTEN mutation (+) 20 1 2 

72 KRAS mutation (-) 16 1 1 

73 BRAF mutation (+) 16 3 1 

75 PTEN mutation (-) 13 2 4 

76 PAX8 methylation (-) 67 1 4 

77 BCR methylation (-) 58 5 10 

79 CCND1,PPARG methylation (-) 13 4 7 

80 COL5A1 methylation (-) 10 11 
83 TF group 1 (+) 338 236 48 
microRNA modules 



index 


drivers/regulators 


Ni 


N 2 


N 3 


4 


ZIC1,MITF seg 18 CNV (-) 


19 


3 


5 


5 


DNMT1 seg 15 CNV (+) 


15 


1 


2 


6 


SRF seg 30 CNV (+) 


13 


1 


1 


7 


MAP2K4 seg 66 CNV (-) 


13 


2 


2 


10 


TP53 mutation (+) 


17 


3 


3 


12 


APC mutation (+) 


10 


1 


1 


14 


BCR methylation (-) 


12 


1 


4 


15 


SMAD3,BDNF (+) 


28 


8 


10 


20 


HIF1A (+) 


14 


1 


2 



Index: module index. N^. number of passengers in the module. N 2 : number of 
co-cited driver/regulator-passenger pairs in the module. N 3 : max number of 
co-cited driver/regulator-passengers over 10 random modules of the same 
size. TF group 1 consists of SMAD3,FOXD1,PLAU,BDNF,FOSL2,FOSL1,RBMS. TF 
group 2 consists of ERG,ELK4,NFATC1,POU5F1,ZNF350. 



lines) on each chromosome. The passenger genes asso- 
ciated with local segment CNVs exhibit a clustering ten- 
dency. To rule out the possibility that the clustering 
patterns arise from the distributions of all protein-cod- 
ing genes on chromosomes, we also marked the loca- 
tions of passenger genes associated with non-local 
segment CNVs (trans-acting effects) in Figure 2 (red 
lines). Despite the presence of dense clusters, the pas- 
senger genes with trans-acting effects are more scattered 
on chromosomes. We evaluated the entropies (disper- 
sions) of gene densities for cis and trans-acting genes on 



each chromosome (Figure 2). The trans-acting genes 
have higher entropies than the cis-acting genes in the 
majority of the chromosomes, confirming the visual 
observation about their differences. 
Trans-acting effects with segment CNVs are mediated by 
transcription factors 

506 mRNA expressions are associated with non-local 
segment CNVs. One possible mechanism to establish 
these trans-acting associations is through intermediate 
regulators on segments: segment CNVs have cis-acting 
effects on the expressions of their constituent regula- 
tors, which in turn modulate the expressions of down- 
stream targets on other loci. We identified 18 
association modules with intermediate regulators and 
reported them in Table 1. Module 44 has segment 18 
(chromosome 3p) as the driver, MITF and ZIC1 as 
intermediate regulators, and 118 passengers. The driver 
CNV as well as regulator and passenger mRNAs all 
exhibit melanoma-specific elevation in NCI-60 cell 
lines (Figure 3). MITF encodes a transcription factor 
that regulates the differentiation and development of 
melanocytes and pigment cell-specific transcription of 
the melanogenesis enzyme genes. ZIC1 encodes a zinc 
finger transcription factor involved in brain develop- 
ment. Multiple lines of evidence support the associa- 
tions with MITF in this module. First, amplification of 
the chromosome 3p region covering MITF is consid- 
ered as a typical driver mutation in melanoma [26]. 
Second, the passenger genes in this module are 
enriched with experimentally validated MITF targets 
from [45] (14 of 118 genes, hyper-geometric p-value < 
1.036 x 10" 13 ), and with genes belonging to the GO 
category of melanosome (4 of 7 genes in the GO cate- 
gory, hyper-geometric p-value < 1.3 x 10" 7 ). Third, 
promoters of passenger genes are also enriched with 
MITF-binding motifs (Table 3, 23 of 118 passenger 
genes containing the motif, hyper-geometric p-value < 
0.0464). Fourth, MITF and 12 of the 118 passenger 
genes are co-cited in previous publications (Table 4). 

Module 45 has segment 31 (chromosome 6q) as the 
driver, MYB, TBP and HDAC2 as intermediate regula- 
tors, and 77 passengers. The driver CNV as well as reg- 
ulator and passenger mRNAs all exhibit leukemia- 
specific elevation (Figure 3). MYB (c-myb) is a transcrip- 
tion factor involved in cell cycle progression, cell prolif- 
eration and differentiation in hematopoiesis. 
Amplifications of this oncogene cause its abnormal 
expressions in leukemia and other solid tumors [46,47]. 
TBP is a TATA-binding protein belonging to the gen- 
eral transcription apparatus. HDAC2 is a histone deace- 
tylase that modifies chromatin structures and represses 
transcription. Associations of MYB and TBP with the 
passenger genes are supported by enrichment of their 
binding motifs on promoters (Table 3, 65 of 77 genes 



44 ZIC1,MITF seg 18 CNV (+) 118 14 

45 TBP,MYB,HDAC2 seg 31 CNV (+) 77 28 

46 SREBF2,EP300 seg 84 CNV (+) 49 3 

47 E2F3,NFYA,SRF seg 30 CNV (+) 31 5 

48 TCF7L2 seg 42 CNV (+) 30 1 
54 UBP1 seg 16 CNV (+) 18 1 

62 APC mutation (+) 136 17 

63 APC mutation (-) 56 3 

65 CDNK2A mutation (-) 43 3 

66 TP53 mutation (-) 35 1 1 

67 BRAF mutation (-) 34 3 

68 TP53 mutation (+) 33 4 

69 CDKN2A mutation (+) 32 6 

71 PTEN mutation (+) 20 1 

72 KRAS mutation (-) 16 1 

73 BRAF mutation (+) 16 3 

75 PTEN mutation (-) 13 2 

76 PAX8 methylation (-) 67 1 

77 BCR methylation (-) 58 5 

79 CCND1,PPARG methylation (-) 13 4 

80 COL5A1 methylation (-) 10 1 
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Table 5 Representative enriched GO categories and pathways in mRNA association modules 



index 


drivers/regulators 


N, 


function 


p-value 


44 


ZIC1,MITF seg 18 CNV (+) 


118 


melanosome 


1 .300 x 1 0" 7 


45 


TBP,MYB,HDAC2 seg 31 CNV (+) 


77 


RNA polymerase II regulation 


1.35 x 10~ 5 


45 


TBP,MYB,HDAC2 seg 31 CNV (+) 


77 


condensed chromosome 


7.928 x 1 0" 5 


46 


SREBF2,EP300 seg 84 CNV (+) 


49 


cholesterol biosynthesis 


3.781 x 10" 4 


47 


E2F3,NFYA,SRF seg 30 CNV (+) 


31 


ceramide signaling pathway 


4.302 x 1 0" 5 


47 


E2F3,NFYA,SRF seg 30 CNV (+) 


31 


NFKB signaling pathway 


8.632 x 1 0" 4 


48 


TCF7L2 seg 42 CNV (+) 


30 


multi-drug resistance factors 


5.884 x 10~ 5 


48 


TCF7L2 seg 42 CNV (+) 


30 


positive regulation of epithelial cell proliferation 


1 .096 x 1 0" 4 


49 


NR2C1,NFYB seg 49 CNV (+) 


22 


mRNA translation 


2.314 x 10~ 4 


54 


UBP1 seg 16 CNV (+) 


18 


RNA Pol II phosphorylation 


4.429 x 1 0" 4 


56 


APEX1 seg 57 (+) 


16 


cleavage of growing transcript 


7.872 x 1 0" 4 


57 


RPA2,EIF2C1 seg 16 CNV (+) 


15 


mRNA splicing 


1.945 x 10~ 6 


57 


RPA2,EIF2C1 seg 16 CNV (+) 


15 


mRNA Pol II transcription initiation 


7.260 x 1 0" 4 


59 


RICS seg 51 CNV (+) 


14 


regulation of apoptosis 


1 .397 x 1 0" 4 


62 


APC mutation (+) 


136 


liver development 


4.932 x 1 0" 4 


66 


TP53 mutation (-) 


35 


caspase activation 


1.101 x 10~ 3 


66 


TP53 mutation (-) 


35 


P53 signaling pathway 


4.138 x 10~ 4 


68 


TP53 mutation (+) 


33 


cell cycle 


9.787 x 1 0" 4 


69 


CDKN2A mutation (+) 


32 


cell motility 


8.510 x 10~ 4 


70 


KRAS mutation (+) 


25 


DNA replication 


2.689 x 10~ 3 


71 


PTEN mutation (+) 


20 


JNK cascade 


2.904 x 10~ 4 


75 


PTEN mutation (-) 


13 


positive regulation of I-kB kinase/NF-^B cascade 


1.745 x 10~ 3 


79 


CCND1,PPARG methylation (-) 


13 


negative regulation of cell proliferation 


6.074 x 10~ 3 


81 


mir group 1 (-) 


121 


transport vesicle 


3.297 x 10~ 4 


82 


mir group 2 (-) 


121 


translation 


3.694 x 10~ 4 


83 


TF group 1 (+) 


338 


integrin signaling pathway 


4.664 x 10~ 8 


83 


TF group 1 (+) 


338 


TGFp signaling pathway 


9.555 x 10~ 4 


84 


TF group 2 (+) 


338 


T cell differentiation 


1.128 x 10~ 4 



Index: module index. N^. number of passengers in the module. Function: enriched GO category or pathway, p-value: hyper-geometric p-value for enrichment. Mir 
group 1 consists of mir-92,mir-96,mir-106a,mir 20b,mir-17,mir-19b,mir-32,mir-135,mir-25,mir-106b,mir-93,mir-106,mir-18,mir-20. Mir group 2 consists of mir-24,mir- 
99b,mir-27b,mir-21,mir-125a,mir-23b,mir-27a,mir-23a. TF group 1 consists of SMAD3,FOXD1,PLAU,BDNF,FOSL2,FOSL1,RBMS. TF group 2 consists of ERG,ELK4, 
NFATC1,POU5F1,ZNF350. 



for MYB motifs, hyper-geometric p-value < 0.0076, 56 of 
77 genes for TBP motifs, hyper-geometric p-value < 
5.370 x 10" 4 ) and co-citations of regulators and passen- 
gers in literature (Additional file 4, Table S3, 14 passen- 
ger genes are co-cited with MYB, and 8 passenger genes 
are co-cited with TBP). Intriguingly, the passenger genes 
in this module are highly enriched with the GO cate- 
gories or pathways involved in generic DNA and RNA 
synthesis such as RNA Pol II regulation and chromatin 
remodeling (Table 5). The elevated activities of DNA 
and RNA synthesis probably reflect high division rates 
of leukemia: leukemia has the second lowest (next to 
colon cancers) average doubling time among the NCI- 
60 cell lines [48]. 

Associations do not necessarily imply causality as mul- 
tiple types of causal structures may yield the same statis- 
tical associations [29]. We confirmed the causal relations 
of MYB and its associated genes by intervention experi- 
ments. We selected 11 putative target genes whose 



mRNAs were associated with segment 31 CNV and 
MYB expressions in multiple datasets (see Materials and 
Methods and Additional file 1, Text SI). As a control 
we selected 6 additional genes with high expression 
levels across all 60 cell lines. The K562 leukemia cell 
line was treated with c-myb siRNA, and the expression 
responses of putative MYB targets and control genes 
were measured by qPCR. Expressions of 6 of 11 putative 
MYB targets were down-regulated with the t-test p- 
value < 0.05 under the c-myb siRNA treatment: CTCF, 
KHDRBS1, NFATC3, ORC1L, PAICS, and ZNF131, and 
RBMX was significantly up-regulated (Figure 4, green 
bars; Additional file 7, Table S6). In contrast, the control 
genes were not differentially expressed under the c-myb 
siRNA treatment (Figure 4, blue bars; Additional file 7, 
Table S6). The primers and the probe used for detecting 
each gene are listed in Additional file 8, Table S7. 

Module 46 has segment 84 (chromosome 22q) as the 
driver, SREBF2 and EP300 as intermediate regulators, 
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Figure 2 Clustering tendency of cis-acting effects with segment CNVs. Top: Locations of passenger genes explained by segment CNVs. Blue 
lines on the bottom mark the locations of passenger genes explained by cis-acting effects. Red lines on the top mark the locations of passenger 
genes explained by trans-acting effects. Bottom: Evenness (entropies) of the distributions of cis and trans acting genes on each chromosome. X- 
axis shows the entropies of cis-acting genes, Y-axis shows the entropies of trans-acting genes. 
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TF53 mutation 



BR CNS CO LE ME LC OV PR RE 

Figure 3 Gene expressions of two segment CNV association modules and one mutation association module. Top: From the top to 
bottom: segment 18 (chr 3p) CNV, ZIC1 expressions, MITF expressions, and expressions of passenger genes. Middle: From the top to bottom: 
segment 31 (chr 6q) CNV, HDAC2 expressions, TBP expressions, MYB expressions, and expressions of passenger genes. Bottom: From the top to 
bottom: TP53 mutation states and expressions of passenger genes. Passengers are ordered by their similarity to driver aberration profiles. The 
top row has the highest similarity. BR: breast cancers, CNS: brain tumors, CO: colorectal cancers, LE: leukemias, ME: melanomas, LC: lung cancers, 
OV: ovarian cancers, PR: prostate cancers, RE: renal cancers. 



and 49 passengers. SREBF2 encodes a sterol regulatory 
element binding transcription factor. The SREBF2-bind- 
ing motif is not enriched on the passenger promoters 
(hyper-geometric p-value < 0.1581, Table 3). However, 
two passenger genes are involved in cholesterol bio- 
synthesis (FDFT1 and HMGCR), and the enrichment is 
significant (p-value < 3.781 x 10" 4 , Table 5). 

Module 47 has segment 30 (chromosome 6p) as the 
driver, E2F3, NFYA and SRF as intermediate regulators, 
and 31 passengers. The passenger genes are moderately 



enriched with E2F3-binding motifs (Table 3, 21 of 31 
genes, p-value < 0.0989), and five (regulator, passenger) 
pairs are co-cited in the same studies (Table 4). SRF 
(serum response factor) is known to regulate RAF1 (ser- 
ine/threonine-protein kinase), a member of passenger 
genes [49]. Moreover, SRF and RAF1 co-participate in 
multiple signaling pathways including IGF1, PDGF, and 
MAPK pathways (Additional file 5, Table S4). SRF is also 
associated with two other passenger genes (RELA and 
SMAD4) in previous studies (Additional file 4, Table S3). 



Li et al. BMC Systems Biology 201 1, 5:186 
http://www.biomedcentral.eom/1 752-0509/5/1 86 



Page 11 of 22 




untreated lie D-myb siRNA centre- 1 siRNA 



1 

u 

Q 2.5 
— 

h 



S3 

£ E 1.5 

p 

U 1 



£ 



0.5 



I 



rfi 



*r- CO 



eo -i < il 

B S o " 
g s u 



ta v> 

□ u 
r 



P> T- 

□ U> 

-I -I 

o o 

EL X 



CO 



< 1- I- 

■* = 

°- I 

U £ - 1 - 



Figure 4 Validation of putative MYB targets with siRNA 
treatments. Top: qRT-PCR measurements of c-myb mRNA levels in 
the K562 leukemia cell line without treatment (left), treated with the 
siRNA against c-myb (middle), and treated with the siRNA against 
GFP (right). Bottom: Relative mRNA levels of selected genes under 
the c-myb siRNA treatment with respect to the qRT-PCR 
measurements under the control siRNA treatment. Green bars: 
expression responses of putative MYB targets. Blue bars: expression 
responses of control genes. 



Associations with aberrations on individual genes or 
microRNAs reveal the causal relations of their targets 

Associations with aberrations on individual genes or 
microRNAs provide direct causal explanations for gene 



expressions. 894 mRNA expressions are associated with 
aberrations on individual genes or microRNAs - muta- 
tions, DNA methylations of genes and microRNA 
expressions. They are grouped into 21 modules and are 
reported in Table 1. The causal relations between dri- 
vers and passengers are supported by previous studies in 
20 association modules. 

Positive and negative associations with gene mutations 
comprise 12 mRNA expression modules and 522 passen- 
ger genes. APC encodes a tumor suppressor protein that 
acts as an antagonist of the Wnt signaling pathway and is 
involved in other processes such as cell migration and 
adhesion. APC is mutated exclusively in colon cancer cell 
lines. APC mutations are positively associated with 136 
passenger genes and negatively associated with 56 pas- 
senger genes (Additional file 1, Figure S9). More previous 
studies support the positive associations with APC muta- 
tions: 21 passenger genes are co-cited with APC in pre- 
vious publications. Some of these passenger genes are 
members or downstream targets of the Wnt signaling 
pathway: BMP4, CD9, CDH1, GPX2, HDAC1, PPARG, 
and SOX9 (Additional file 4, Table S3). Mutations of 
APC activate the Wnt pathway thus up-regulate its target 
genes. Other passenger genes are associated with APC 
aberrations in selected tissue or disease samples: DDR1, 
DLG3, ETS2, HOXA10, ITGB4, MEST, PLA2G4B, 
SIGIRR, and SPINT2 (Additional file 4, Table S3). In 
contrast, only 4 negative associations with APC are sup- 
ported by prior publications. APC and PSMD2 are in the 
pathway of P-catenin degradation (Additional file 5, 
Table S4). APC and CMTM3, EXOl and HLTF are 
reported to be hyper- methylated in selected colon cancer 
samples (Additional file 4, Table S3). 

CDKN2A (TP16) encodes a tumor suppressor that 
inhibits cyclin-dependent kinase 4 (CDK4) in cell cycle 
control. It possesses loss-of-function mutations (mostly 
frameshift insertions/deletions) in 33 samples. CDKN2A 
mutations are negatively associated with 43 passenger 
genes and positively associated with 32 passenger genes 
(Additional file 1, Figure S10). Three negative associa- 
tions are supported by previous studies: ERBB3, FOS 
and HSD17B4 (Additional file 4, Table S3). Six positive 
associations are supported by previous studies: CAV1, 
LIMK1, MSN, THBD, TRDMT1, and YWHAG (Addi- 
tional file 4, Table S3). Moreover, the positively asso- 
ciated genes are enriched with the GO category of cell 
motility (p-value < 8.51 x 10" 4 ) and signaling pathways 
of cell motility (p-value < 2.28 x 10" 3 ) and integrin (p- 
value < 2.42 x 10" 3 ). 

TP53 encodes a master regulator for apoptosis, cell 
cycle control and senescence in response to stress con- 
ditions. TP53 mutations are negatively associated with 
35 passenger genes and positively associated with 33 
passenger genes (Additional file 1, Figure S9). Negative 
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associations are highly enriched with known TP53 tar- 
gets from multiple lines of evidence. First, the passenger 
genes are enriched with a list of 45 known TP53- 
responsive genes confirmed by ChlP-Seq assays ([50], 7 
of 35 genes, hyper-geometric p-value < 7.6608 x 10" 12 ). 
Second, promoters of the passenger genes are enriched 
with TP53-binding motifs (16 of 35 genes, hyper-geo- 
metric p-value < 0.0041, Table 3). Third, 11 of the 35 
passenger genes are co-cited with TP53 in previous pub- 
lications (Table 4): BAX, CCNG1, CDKN1A, DDB2, 
EGFL7, FDXR, GDF15, LTBR, MDM2, TNFRSF10B 
(Additional file 4, Table S3). Most of these co-citations 
pertain to well-known pathways of TP53 regulation. For 
instance, MDM2 is a ubiquitin ligase and transcriptional 
target of TP53, CDKN1A (TP21) is a cyclin dependent 
kinase inhibitor and transcriptional target of TP53, BAX 
is a BCL2-associated protein involved in apoptosis. 
Fourth, the passenger genes are enriched with GO cate- 
gories of caspace activation (hyper-geometric p-value < 
0.001), cyclin-dependent kinase activity (p-value < 
0.0025), cell cycle arrest (p-value < 0.0069), and path- 
ways of TP53 signaling (p-values < 1.55 x 10" 5 ,< 4.14 x 
10" 4 ), and G2/M checkpoint (p-value < 0.0012) (Table 
5). In contrast, positive associations with TP53 muta- 
tions are not enriched with known TP53 targets or pro- 
moters containing TP53-binding motifs. Negative 
associations with TP53 mutations imply positive regula- 
tion of TP53 since most TP53 mutations are loss-of- 
function mutations. 

161 mRNA expressions are negatively associated with 
DNA methylations of five gene clusters (Table 1). They 
include the association modules of PAX8 (61 passenger 
genes), BCR (58 passenger genes), HOXC13 (19 passen- 
ger genes), CCND1 and PPARG (13 passenger genes), 
and COL5A1 (10 passenger genes). The passenger genes 
of PAX8 and HOXC13 modules are enriched with the 
binding motifs of their drivers (PAX8: 34 of 61 genes, 
hyper-geometric p-value < 0.0014, HOXC13: 12 of 19 
genes, hyper-geometric p-value < 0.016). Furthermore, a 
passenger gene in the CCND1/ PPARG module, KLF4, is 
regulated by both CCND1 and PPARG from multiple 
previous studies (Additional file 4, Table S3). 

241 mRNA expressions are negatively associated with 
two clusters of microRNA expressions (Table 1). Asso- 
ciation module 81 possesses mir-92, mir-96, mir-106a, 
mir-20b, mir-17, mir-19b, mir-32, mir-135, mir-25, mir- 
106b, mir-93, mir-106, mir-18, and mir-20 as drivers 
and 121 passenger genes. Association module 82 con- 
tains mir-24, mir-99b, mir-27b, mir-21, mir-125a, mir- 
23a, mir-23b, and mir-27a as drivers and 120 passenger 
genes. The passenger genes in both modules are 
enriched with the predicted targets of their driver 
microRNAs (Table 3, association module 81: mir-96, p- 
value < 0.015, mir-106a, p-value < 0.046, mir-17, p- 



value < 0.038, mir-93, p-value < 0.032; association mod- 
ule 82: mir-21, p-value < 0.001). Therefore, some pas- 
senger mRNA expressions in these modules are 
probably regulated by their driver microRNAs. 
Gene clusters with coherent expression profiles are likely to 
be regulated by transcription factors 

A unique characteristic of NCI-60 mRNA expression 
data is the presence of a few large clusters with highly 
coherent expression profiles (Additional file 1, Figures 
SI, S2, S3, S4, S5, S6, S7, S8, S9, S10, Sll, S12, S13). 
Two of the largest clusters are not associated with any 
observed molecular aberrations. We identified the tran- 
scription factors that were associated with the mRNA 
expressions in these clusters as candidates for inter- 
mediate regulators. These transcription factors may reg- 
ulate the coherent expressions of downstream genes, yet 
the drivers that control the regulator expressions are 
unknown. Two association modules with intermediate 
regulators are reported in Table 1. Module 83 contains 
SMAD3, FOXD1, PLAU, BDNF, FOSL2, FOSL1 and 
RBMS1 as regulators and 338 passenger genes. Module 
84 contains ERG, ELK4, NFATC1, RFX3, POU5F1 and 
ZNF350 as regulators and 338 passenger genes. The 
expression profiles of the two modules are shown in 
Figure 5. 

Associations of regulators and passengers in these 
modules are supported by rich prior studies. Promoters 
of module 83 passengers are highly enriched with 
SMAD3-binding motifs (Table 3, 271 of 338 genes, 
hyper-geometric p-value < 1.058 x 10" 7 ). SMAD3 
encodes a transcriptional modulator activated by trans- 
forming growth factor-/? (TGF/3). Indeed, many passen- 
ger genes are members or downstream targets of the 
TGF/3 signaling pathway according to previous studies, 
such as TGFBR2, ADAM12, CAV1, CCNB1, and EGFR 
(Table 6 and Additional file 4, Table S3). In addition, 
many passenger genes are co-cited with other candidate 
regulators (PLAU, BDNF, FOSL1) in previous publica- 
tions. For instance, associations of BDNF and passenger 
genes such as ADAM17, AXL, and BCAT1 in neuron 
cells are previously reported (Additional file 4, Table 
S3). 

Promoters of module 84 passengers are enriched with 
the binding motifs of NFATC1 (265 of 388 genes, 
hyper-geometric p-value < 3.15 x 10" 5 ), POU5F (18 of 
338 genes, hyper-geometric p-value < 0.0027), ERG (67 
of 338 genes, hyper-geometric p-value < 0.0093), RFX3 
(75 of 338 genes, hyper-geometric p-value < 0.0481), 
and ELK4 (76 of 338 genes, hyper-geometric p-value < 
0.0518). Many passenger genes are co-cited with some 
of these regulators in previous studies (Table 4). 
Most association modules exhibit tissue-specific expressions 
NCI-60 cell lines constitute 9 distinct tissue origins. The 
patterns of molecular aberrations and gene expressions 
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Figure 5 Gene expressions of two TF expression association modules. Top: From the top to bottom: expressions of putative regulators of 
the module (SMAD3, FOXD1, PLAU, BDNF, FOSL2, F0SL1, RBMS1) and expressions of passenger genes. Bottom: From the top to bottom: 
expressions of putative regulators of the module (ERG, ELK4, NFATC1, RFX3, POU5F1, ZNF350) and expressions of passenger genes. Regulators 
are ordered by hierarchical clustering with their aberration profiles, and their dendrogram are displayed on the left of the panel. Passengers are 
sorted by their average similarity to the driver aberration profile. The top row has the highest similarity. BR: breast cancers, CNS: brain tumors, 
CO: colorectal cancers, LE: leukemias, ME: melanomas, LC: lung cancers, OV: ovarian cancers, PR: prostate cancers, RE: renal cancers. Samples 1 
and 5 are estrogen-positive breast cancers, and samples 2-4 are estrogen-negative breast cancers. 



are likely to vary between tissue types. The inferred 
modules capture strong associations between drivers 
and passengers across all samples. Strong associations, 
however, can be manifested by either tissue-specific or 
tissue-independent patterns. These two types of patterns 
are illustrated in Figure 3. The driver of module 44 (seg- 
ment 18 CNV) is amplified in melanoma cell lines alone, 
and its passenger expressions are up-regulated in mela- 
noma cell lines. In contrast, the driver of module 66 
(TP53 sequence) has prevalent mutations among all tis- 
sue types, and its passenger expression patterns are 
independent of tissue types. To check whether the 
effects of dysregulation are universal or tissue-specific, 
we extracted the tissue-specific patterns of drivers and 
passengers in each module (see descriptions in Materials 
and Methods and Additional file 1, Text SI) and visua- 
lized them in Figure 6. We observed the following intri- 
guing results. First, about half of the cis-acting modules 
(internal segment CNVs as drivers) are coherently up or 
down regulated in at least one tissue type. In contrast, 
the majority of the trans-acting modules (all other types 
of drivers) have the same property. Second, despite 
many association modules possess tissue-specific pat- 
terns, none of them are coherently up or down 



regulated among all tissue types. In other words, a mod- 
ule has incoherent driver aberrations and passenger 
expressions in at least one tissue type. Third, the tissue- 
specific patterns are dominated by four tissue types: 
melanoma, leukemia, colon and central nervous system. 
For instance, 8 and 4 of 41 trans-acting modules contain 
up and down regulated passengers in leukemia respec- 
tively. 8 and 9 of 41 trans-acting modules contain up 
and down regulated passengers in melanoma respec- 
tively. 5 trans-acting modules contain differentially 
expressed passengers in renal cancers. In contrast, very 
few modules possess tissue-specific patterns on lung, 
ovarian or prostate cancers. Fourth, no module is coher- 
ently up or down regulated in breast cancer cell lines. 
Yet several of them possess coherent expressions in 
estrogen (ER)-positive or ER-negative samples. For 
instance, module 83 (Figure 5) passengers are up-regu- 
lated in ER-negative samples and down-regulated in ER- 
positive samples. Fifth, tissue-independent patterns 
belong to transacting modules with external segment 
CNVs or mutations as drivers. Five of 14 modules asso- 
ciated with mutations are tissue-independent: module 
65 (CDKN2A), module 68 (TP53), module 70 (KRAS), 
module 74 (PIK3CA) and module 75 (PTEN). The 
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Figure 6 Tissue-specific patterns of mRNA association modules. Edch row represents the tissue-specific pdttern of a module. Red: coherent 
up-regulation in a tissue type. Green: coherent down-regulation in a tissue type. Black: incoherent expressions in a tissue type. Cis-acting 
modules range from 1 to 43. Trans-acting modules range from 44 to 84. BRER+: ER-positive breast cancers, BRER-: ER-negative breast cancers, 
CNS: brain tumors, CO: colorectal cancers, LE: leukemias, ME: melanomas, LC: lung cancers, OV: ovarian cancers, PR: prostate cancers, RE: renal 
cancers. 
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microRNA FDRs exhibit same trend as the mRNA 
FDRs. Associations with segment CNVs yield the lowest 
rates, followed by associations with mutations and tran- 
scription factor expressions. Associations with DNA 
methylation yield the worst performance. Associations 
with microRNA expressions and local segment CNVs 
were not considered. The overall FDRs calculated by the 
two methods are 0.292 and 0.394 respectively. 

Information about regulators and functions of micro- 
RNAs is much more scattered than those of protein- 
coding genes. We incurred a batch search on PubMed 
to identify co-citations of drivers/ regulators and passen- 
ger microRNAs in these modules. Overall, co-citations 
of drivers/regulators and passengers were found in 9 
associations modules. Table 4 and Additional file 4, 
Table S3 report the co-citation outcomes for microRNA 
modules. 

Cis-acting effects with segment CNVs are absent on 
microRNAs 

A remarkable distinction between mi-croRNA and 
mRNA associations is the lack of associations with local 
segment CNVs in microRNA modules. Mi-croRNA 
expressions are poorly correlated with their local seg- 
ment CNVs compared to mRNA expressions (Addi- 
tional file 1, Figure S18). The only possible exception is 
a module of microRNAs located on chromosome 14p 
(Additional file 3, Table S2). These microRNAs exhibit a 
coherent expression pattern (Additional file 1, Figure 
S16) and are localized on chromosome 14p. Thus they 
are likely to be modulated by local segment CNVs. Yet 
we cannot verify these associations since chromosome 
14p is not covered by the CNV data. 
Associations suggest the existence of feedback loops 
between microRNAs and protein-coding genes 
Co-citation search outcomes demonstrate bidirectional 
regulatory links between the driver/regulator genes and 
passenger mircroRNAs. Table 7 summarizes these regu- 
latory links reported from prior studies. Most drivers/ 
regulators of these microRNA modules are known or 
predicted targets of the passenger microRNAs. Hence 



there are possible bidirectional links connecting tran- 
scription factors and microRNAs. In some cases, bidir- 
ectional links are reported. For instance, MAP2K4 in 
module 7 is both a known regulator and target of two 
passenger microRNAs mir-25 and mir-92. Similarly, 
TP53 in module 10 is both a known regulator and target 
of two passenger microRNAs mir-15a and mir-16. 

Discussion 

Uncovering causal relations of genes from static, obser- 
vational data alone is challenging due to the ambiguities 
of transferring associations into causal information. We 
address this problem by exploiting two widely accepted 
hypotheses in cancer genomics: (l)molecular aberrations 
on DNAs (mutations, copy number variations, DNA 
methylations) affect the levels of RNA transcripts but 
not the other way around, (2)if both cis-acting and 
trans-acting effects of segment CNVs are detected, then 
cis-acting effects are more likely to be the direct causes 
for expression changes. Based on these hypotheses we 
built association modules between putative driver mole- 
cular aberrations and passenger mRNA or microRNA 
gene expressions. Our predictions are supported by var- 
ious validation results. First, the downstream targets of 
several well-known cancer-related genes such as MITF, 
TP53 and APC are retrieved from our association mod- 
ules. Second, the passenger genes of large modules are 
enriched with the putative targets of their regulators or 
drivers. Third, putative targets of MYB are down-regu- 
lated under the MYB siRNA treatment. Fourth, associa- 
tions between molecular aberrations and microRNA 
expressions reveal the bidirectional regulatory links 
between transcription factors and microRNAs. 

One intriguing observation is that each type of mole- 
cular aberrations participate in at least one association 
module. Therefore, to track the causes of expression 
variations it is necessary to interrogate multiple types of 
molecular aberrations concurrently. NCI-60 datasets are 
sparse in mutations (24 valid genes), copy number varia- 
tions (219 valid genes) and DNA methylations (320 valid 
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genes). Consequently, about half of the mRNA expres- 
sion data (3571 genes) are not significantly associated 
with observed aberrations. This drawback will become 
less prominent in new generations of comprehensive 
cancer genomics projects (e.g., TCGA [7] and ICGC 
[8]), as they employ high density microarrays and next- 
generation sequencing to cover the entire genomes. 

Associations with driver molecular aberrations can be 
extended to clinically relevant phenotypes such as drug 
resistance [51]. Unlike mRNA or microRNA expres- 
sions, these phenotypes are not the direct products of 
molecular aberrations. Thus associations are likely to be 
mediated by the genes functionally related to the pheno- 
types. For instance, from the drug response data on 
NCI-60 cell lines [19] we found leukemia cell lines were 
sensitive to the majority of compounds tested (data not 
shown). This observation coincides with up-regulation 
of genes involved in transcriptional and translational 
processes in leukemia (Additional file 5, Table S4). Both 
drug responses and gene expressions reflect the high 
growth and division rates of leukemia cell lines relative 
to other samples. 

Two of the largest association modules (modules 83 
and 84) lack driver aberrations but consist of intermedi- 
ate regulators. In particular, multiple lines of evidence 
indicate the expressions of module 83 are controlled by 
TGFp pathway activities. Yet it is unclear which molecu- 
lar aberrations modulate the expressions of the inter- 
mediate regulators. Examination of the mutations and 
DNA methylations of members on the TGFp pathway 
may reveal their driver aberrations. 

Cis-acting and trans-acting effects with segment CNVs 
dominate the association modules of mRNA expressions 
(61 of 84 association modules). Amplifications and dele- 
tions of chromosomal segments are powerful driver 
aberrations, as they can possibly affect many genes on 
the segments and downstream targets on other loci. 
Determination of cis-acting and trans-acting effects with 
CNVs remains an open problem. We observed a ten- 
dency of cis-acting genes to cluster on certain hot- 
regions of chromosomes. Yet many cis-acting genes are 
still scattered around the entire chromosomes. The dis- 
tributions of cis and trans-acting genes will become 
more clear from the data of high density CGH arrays. 

Associations with mutations of several cancer-related 
genes - TP53, CDKN2A, APC - are largely consistent 
with their known targets. However, the passenger sets of 
positive and negative associations exhibit differential 
levels of enrichment. Passenger genes negatively asso- 
ciated with TP53 mutations (module 66) are enriched 
with TP53-binding motifs (Table 3) and known targets 
of TP53 (Table 5 and [50]). In contrast, passenger genes 
positively associated with TP53 mutations (module 68) 
do not possess the enrichment. TP53 undergoes loss-of- 



function mutations, thus negative associations imply 
positive regulation between TP53 and its targets. 
Although negative regulation of TP53 has been pre- 
viously reported [52], on NCI-60 TP53 seems to be an 
activator of its prominent direct targets. The enriched 
functional information is reversed in APC and CDKN2A 
mutations. 15 of the passenger genes positively asso- 
ciated with APC mutations (module 62) are involved in 
the Wnt pathway or associated with APC. In contrast, 
only 3 of the passenger genes negative associated with 
APC mutations (module 63) are related to APC. Simi- 
larly, 5 of the passenger genes positively associated with 
CDKN2A mutations (module 69) are related to 
CDKN2A, whereas only 2 of the passenger genes nega- 
tively associated with CDKN2A mutations (module 65) 
are related to CDKN2A. Unlike TP53, APC and 
CDKN2A are inhibitors of signaling pathways (Wnt and 
cell cycle control) and possess no direct targets. Hence 
their loss-of-function mutations would activate the 
downstream targets of the pathways. In contrast, genes 
down-regulated in APC or CDKN2A mutants are likely 
to arise from secondary effects instead of direct conse- 
quences of APC or CDKN2A mutations. 

Two large clusters of mRNA expressions are nega- 
tively associated with groups of microRNA expressions 
(modules 81 and 82). The causality between microRNAs 
and mRNAs are ambiguous since regulation can take 
place in both directions. The passenger genes are 
enriched with the predicted targets of driver micro- 
RNAs. Additional evidence is required to determine 
their causal directions. 

In NCI-60 cell lines, microRNA and mRNA expres- 
sions seem to be driven by distinct molecular aberra- 
tions. Unlike mRNA expression modules, none of the 
microRNA expression modules are associated with 
local segment CNVs. This is curious since many 
microRNAs in this study are located on the partitioned 
segments and between the protein-coding genes pos- 
sessing the cis-acting effects with segment CNVs. Poor 
correlations between microRNA expressions and their 
local segment CNVs suggest either the low resolution 
of the CGH data or the trans-acting nature of micro- 
RNA regulation. 

It is more difficult to validate the association modules 
of microRNA expressions due to the paucity of regula- 
tory and functional information. Intriguingly, we found 
that many drivers and regulators were known or pre- 
dicted targets of their passenger microRNAs (Table 7). 
In some cases, regulation along both directions between 
drivers and passengers is reported (for instance, 
MAP2K4 v.s. mir-25 and mir-92). The association out- 
comes and prior studies suggest feedback regulation 
between driver transcription factors and passenger 
microRNAs. 
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It is worth noting that pairwise associations inferred 
from the data are projections of multi-variate models from 
drivers to the target passenger. A driver links to a pas- 
senger if their association cannot be explained by other 
driver variables. Therefore, "pairwise associations" in this 
study carry a very different meaning from purely pairwise 
scores such as correlation coefficients. A more adequate 
analogy is conditional independence in probabilistic gra- 
phical models. A driver A links to a passenger X if A is 
not conditionally independent of X given any other dri- 
ver variable. Such direct and indirect relations can be 
captured by a variety of prior approaches such as Baye- 
sian networks, mutual information and regression. Our 
contribution is not to invent/introduce a new mathemati- 
cal tool to tackle the same problem, but to develop a 
data integration framework to exploit the statistical and 
mechanistic properties of the data. More specifically, the 
novelty of this work includes (l)Prioritize the order of 
adding association links according to mechanistic infor- 
mation. Cis-acting effects are first considered as they 
provide direct explanation for passenger expression fluc- 
tuations. Trans-acting effects with aberrations on DNA 
as drivers (CNV, mutations, DNA methylations) are 
taken into account only if they provide additional expla- 
natory power relative to cis-acting effects. Finally, trans- 
acting effects without DNA aberrations (microRNA or 
transcription factor expressions) are introduced on top of 
the lower layered models. (2) Project the complex net- 
work of conditional independence relations onto pairwise 
links and group these links as driver-centric association 
modules. This simplification is useful and practical as it 
is much easier to examine and validate a coherent mod- 
ule than a complex network. 

Projecting a complex regulatory network onto well- 
structured modules certainly yields information loss. 
Genes are often regulated by combinatorial interactions 
of multiple transcription factors, chromatin modifiers 
and signaling proteins. Moreover, the regulatory pro- 
grams may vary with tissue types. In this work, we 
chose to ignore combinatorial interactions and tissue- 
specific regulatory programs and focused on strong mar- 
ginal effects manifested on the variations across all 60 
cell lines. This simplification seems necessary given the 
limited size of NCI-60 data. A new generation of cancer 
genomic data - such as TCGA and ICGC - often cover 
much more patients with a specific tumor type. There- 
fore, it is possible to include the combinatorial interac- 
tions and tissue-specific regulations in an extension of 
the current models. 

The FDR values clearly depend on the constraints of 
associations. Internal segment CNVs require the physi- 
cal proximity of drivers and passengers, and external 
segment CNVs require the existence of intermediate 
regulators linking segment CNVs and passenger 



expressions. These additional constraints push the FDRs 
below significant levels. In contrast, modules of muta- 
tions, DNA methylations, microRNA and TF expres- 
sions as drivers rely only on associations between 
drivers and passengers. Thus their FDRs are substan- 
tially higher than standard significant values. To lower 
the FDRs on those modules, additional constraints such 
as evidence of physical interactions between drivers and 
passengers are probably needed. However, since the 
FDRs reported in Table 2 are the aggregate results over 
all modules of the same type, the high values do not 
deteriorate the relevance of individual modules. For 
instance, multiple lines of evidence support the modules 
associated with TP53 mutations, CDKN2A mutations, 
PAX8 methylations, as well as TF groups 1 and 2 
expressions. It is difficult to discriminate these biologi- 
cally relevant modules from potentially spurious mod- 
ules from the NCI-60 data alone. External information 
or experimental validation has to be incorporated. In 
this work, we use external information and experimental 
validation to justify the models inferred from data alone. 
A refined version of the model construction algorithm 
should include these additional information in the loop. 

Only 5 transcription factor binding motifs are signifi- 
cantly enriched on module passengers after Bonferroni 
correction (TBP, PAX8, SMAD3, NFATC1 and POU5F). 
These unsatisfactory results can be attributed to both 
the conservative nature of Bonferroni correction and the 
accuracy of motif presence as a proxy for regulation. 
We sought the presence of motifs within 5kb upstreams 
of the transcription start sites. This simple scheme may 
create many false positives and false negatives. For 
instance, only 27 of the 113 experimentally validated 
MITF targets [45] contain the MITF-binding motif 
within 5kb promoters, and only 13 of the 67 experimen- 
tally validated TP53 targets [50] contain the TP53-bind- 
ing motif within 5kb promoters. The discrepancy 
between motif presence and experimentally validated 
targets may partially explain the relatively high motif 
enrichment p-values of modules 44 (MITF) and 66 
(TP53). Restrictions to promoters with multiple motif 
occurrences reduce false positives but increase false 
negatives. The enrichment results are worse than Table 
3 (Additional file 9, Table S8). Consequently, we should 
treat motif enrichment analysis as one error-prone vali- 
dation and use it together with other validations (co- 
citations, functional category or pathway enrichments). 
Other types of validations are also subjected to error. It 
remains an open problem to systematically validate the 
large-scale models inferred from high-throughput data. 

In this work, we built association modules from NCI- 
60 data alone and attempted to validate the inferred 
modules with external information. An alternative and 
common approach is to start with prior models using 
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external information (e.g., pathways, GO categories, pro- 
tein-DNA and protein-protein interactions) and learn 
the models explained by the data. As a simple compari- 
son, we treated each pathway and GO category as the 
passenger gene set of a module and identified the dri- 
vers that significantly explained their expressions. The 
results are reported in Additional file 10, Table S9. 
Overall, only a small number of GO categories and 
pathways are significantly explained by driver aberra- 
tions in NCI-60 data. Prominent ones include genes 
involved in melanosome and melanin biosynthesis and 
explained by MITF transcription factor expressions, 
ribosome genes explained by a cluster of 9 microRNAs, 
and members of proteasome genes explained by internal 
segment CNVs on chromosome 14q. Notice that genes 
involved in melanosome and melanin biosynthesis sub- 
stantially overlap with module 44. 

Conclusions 

We proposed a computational method to build associa- 
tions between molecular aberrations (drivers) and 
mRNA/microRNA expressions (passengers) and grouped 
the genes/microRNAs into modules according to their 
associated molecular aberrations. Intriguingly, a consid- 
erable portion of these associations implied causal rela- 
tions according to in-silico and experimental validations. 
We found passengers in these association modules were 
enriched with the putative targets of the drivers, func- 
tional categories and pathways in which the drivers were 
involved, and regulatory/association links confirmed by 
prior publications. Moreover, we demonstrated that the 
predicted MYB targets were down-regulated in a MYB- 
siRNA treated leukemia cell line. The reported associa- 
tion modules indicated that gene/microRNA expressions 
in cancer were driven by diverse aberration mechanisms 
including copy number variations, mutations, DNA 
methylations, microRNA and transcription factor 
expressions. In addition, the results suggested distinct 
driver mechanisms between mRNA and microRNA 
expressions and the existence of bidirectional regulatory 
links between microRNAs and transcription factors. 

Methods 

Data sources and processing 

NCI-60 constitutes a panel of 60 cell lines derived from 
9 tissue types: breast (5 cell lines), central nervous sys- 
tem (6 cell lines), colon (7 cell lines), leukemia (6 cell 
lines), melanoma (10 cell lines), lung (9 cell lines), ovary 
(7 cell lines), prostate (2 cell lines) and kidney (8 cell 
lines). Melanoma cell line MDA-N is derived from 
MDA-MB435, thus the two cell lines are highly similar. 
Among the 5 breast cancer cell lines 2 are estrogen- 
positive (MCF7 and T47D) and the remaining 3 are 
estrogen-negative (MDA-MB-231, HS578T. BT-549). 



Seven datasets of NCI-60 cell lines were downloaded 
from the website of the Genomics and Bioinformatics 
Group at NCI (GBC): mutation analysis of 24 cancer 
genes [10], Comparative Genomic Hybridization (CGH) 
array data of DNA copy number variations [11], cytosine 
methylation profiling on promoters [12], cDNA microar- 
ray data [15], Affymetrix transcript profile data [16], and 
Agilent transcript profile data [17] of mRNA expres- 
sions, and microRNA expression data [13]. The union of 
these datasets covered 14856 genes and 303 microRNAs. 

Continuous data (CNVs, mRNA expressions, micro- 
RNA expressions, DNA methylations) were first rank- 
transformed into cumulative distribution function (CDF) 
values and then converted into probability vectors of 
trinary states using probabilistic quantization [33]. This 
transformation normalizes the data in the same scale 
and preserves the information from continuous data. 
Discrete data (mutations) was directly fed into the 
model without processing. Detailed procedures of data 
normalization and combination are reported in Addi- 
tional file 1, Text SI. 

Spatial dependency of CNV data was manifested from 
the measurements of 219 genes in a CGH array [33]. 
Adjacent probes on the same chromosome tend to have 
higher correlations than randomly selected probes. To 
exploit the spatial dependency of CGH probes, we 
devised a recursive algorithm to partition each chromo- 
some into correlated segments using the CNV data. The 
algorithm is described in Additional file 1, Text SI. 86 
segments were obtained from the CNV data and 
reported in Additional file 6, Table S5. The CNV data 
of a segment is the mean over the CNV data of all its 
constituent probes. 

Logistic regression models 

We used logistic regressions to model the effects of 
molecular aberrations on gene or microRNA expres- 
sions. Denote y the expression of a gene or microRNA 
and x the driver aberrations that explain y. The condi- 
tional probability is 

m^) = ^^ x ^y / X l >0 f Wi. (1) 

fi(x)'s are scalar feature functions specifying the rela- 
tions of x and y. X/s are nonnegative parameters, and Z 
(x) is the partition function that normalizes the condi- 
tional probabilities. In this workyj(#)'s are linear func- 
tions of feature values. fi(Xi) = x t if aberration de- 
activates expression y, fi{xi) = -x t if aberration x 
represses expression y. 

Given observed data D and two nested models M 0 ,Mi 
2 M 0 , we incurred a standard hypothesis testing proce- 
dure to calculate the log-likelihood ratio and p-value of 
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D. The pairwise scores measured the goodness of fit of 
Mi against the null model M 0 . Furthermore, given the 
models Mi and M 2 of two candidate driver aberrations 
we tested the joint model M 12 against Mi and M 2 
respectively. The testing results provide information 
whether the explanatory power of one model can be 
covered by another. Detailed procedures of parameter 
estimation, evaluation of log likelihood ratios and p- 
values as well as model selection are described in Addi- 
tional file 1, Text SI. 

Building association modules 

An association module consists of a small number of 
bona fide driver molecular aberrations and a subset of 
passenger genes or microRNAs. The expression pro- 
files of passengers are explained by fluctuations of dri- 
vers. For trans-acting effects with segment CNVs, the 
associations between drivers and passengers are 
mediated by regulators on the segments. We require 
that a valid association module should satisfy the fol- 
lowing conditions. First, a driver explains each passen- 
ger expression with a significant pairwise score. 
Second, a driver provides an explanatory power that 
cannot be replaced by any other drivers. Third, cis- 
acting effects have a higher priority than associations 
with non-local aberrations. Fourth, a module has at 
least 10 passengers. Toward this goal the association 
modules were constructed by the following proce- 
dures. 

1. For each passenger mRNA or microRNA expres- 
sion, evaluate the pairwise association scores with all 
candidate driver aberrations. Keep the associations 
that pass the threshold values of these scores. 

2. If a passenger gene or microRNA is associated 
with an external segment CNV, then find the tran- 
scription factor mediating the associations between 
external segment CNVs and passenger expressions. 

3. Rule out a transcription factor as a candidate dri- 
ver for the modules of regulatory effects if the tran- 
scription factor is associated with any observed 
molecular aberration. 

4. For each passenger mRNA or microRNA expres- 
sion, incur model selection to rule out the driver 
aberrations that can be replaced by other drivers. 
Briefly, for the models Mi and M 2 of two candidate 
driver aberrations, test the joint model M 12 against 
Mi and M 2 respectively. If M 12 is significantly better 
than Mi but does not outperform M 2 , then remove 
Mi. Apply this filtering procedure for all pairs of dri- 
ver aberrations. 

5. Group passenger genes or microRNAs into mod- 
ules by their drivers. Report the modules with > 10 



passenger members. Passengers of distinct associa- 
tion modules may overlap. 

Detailed procedures for building association modules 
are described in Additional file 1, Text SI. 

Experimental validation 

Selection of putative targets and control genes of MYB 

To validate the causal implications of association out- 
comes we selected several putative targets of MYB and 
measured their expression responses with and without 
the treatment with an MYB siRNA. Putative MYB tar- 
gets satisfy the following criteria: (l)their mRNA expres- 
sions were positively associated with segment 31 CNV, 
(2)their mRNA expressions were positively associated 
with MYB expression in NCI-60 data under normal 
conditions, (3)their mRNA expressions were positively 
associated with MYB expression in another expression 
dataset of NCI-60 cell lines exposed under radiation 
([53], Additional file 1, Figure S19), (4)their mRNA 
expressions were positively associated with MYB in a 
dataset containing 73 normal tissues [54]. 31 genes 
passed these filtering criteria. Among them we then 
selected 11 genes based on the constraints of primer 
design: CPSF6, CTCF, HCLS1, KHDRBS1, NFATC3, 
ORC1L, PAICS, POLD3, RBMX, SMC1A, and ZNF131. 

In addition to predicted MYB targets we also selected 
several control genes which were constituency 
expressed across the NCI-60 cell lines, regardless of the 
MYB expression levels. From the top 10 candidates we 
selected 6 based on the constraints of primer design: 
SSR2, GNB2, HSPC152, EIF4A1, RPS15, HINT1. 
Detailed procedures of selecting putative MYB targets 
and control genes are described in Additional file 1, 
Text SI. 

siRNA treatments and RNA measurements 

Human erythroleukemia cells (K562) were purchased 
from NCI (Frederick, MD) and maintained in RPMI 
1640 medium (Invitrogen, CA). The sequences of siR- 
NAs were adopted from previous reports: siRNA against 
c-myb (c-myb siRNA); sense: 5-UGUUAUUGCCA 
AG C ACUU A A A -3; anti-sense: 5-UAAGUGCUUGG- 
CAAUAACAGAA -3; siRNA against GFP (control 
siRNA): sense sequence: 5-UGCGCUCCUGGAC- 
GUAGCCTT-3; antisense: 5-GGCUACGUCCAG- 
GAGCGCATT-3. 

siRNA transfection was performed as described pre- 
viously with minor modifications. Briefly, cells were 
seeded at a density of 2 x 10 5 cells/well in a 6-well plate 
with 2 mL culture medium per well. Right after the 
seeding, the cells were transfected with c-myb siRNA or 
control siRNA using LipofectAMINE 2000 (Lf 2000, 
Invitrogen, CA). The siRNA/Lf2000 complex was then 
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incubated with the cells (siRNA concentration = 100 
nM) for 48 h and the cells were harvested for further 
assays. 

RNA isolation and cDNA synthesis were demon- 
strated according to the manufacturers protocols. 
Briefly, total RNA of the treated K562 cells was isolated 
and then converted to cDNA by incubating 1 ^g RNA 
with 2.5 pM 01igo(dT)23, 500 dNTP and 1 jA of 
RNase Inhibitor (New England Biolabs, Beverly, MA), 5 
mM DDT, 4 jA of 5 First-Strand buffer and 1 jA of 
Superscript III Reverse transcriptase (Invitrogen) for 1 h 
at 50 degrees Celsius in a total volume of 20 fA in a 
microtube. 

Real-time PCR was performed in a Mastercycler ep- 
gradient-S thermocycler (Eppendorf, Hamburg, Ger- 
many) with FastStart TaqMan Probe Master (ROX) and 
Universal ProbeLibrary (Roche Diagnostics GmbH, 
Manheim, Germany) according to the manufacturers 
instructions. The primers and the probe used for detect- 
ing each gene are listed in Additional file 8, Table S7. 
The amplification conditions were 10 min at 95 degrees 
Celsius, followed by 40 cycles of 95 degrees Celsius for 
15 s and 60 degrees Celsius for 1 min. The quantity was 
determined from the experimental threshold cycle on a 
standard curve of the data from a series of serial dilu- 
tion of the mixture of generated cDN A. The mRN A 
level of the gene of interest was normalized by that of 
ACTB (/3-actin) as an endogenous control. Detailed pro- 
cedures of RNA quantification are described in Addi- 
tional file 1, Text SI. The measured data of qRT-PCR 
are reported in Additional file 7, Table S6. 

In-silico validation 

Evaluation of false discovery rates 

For each type of association modules, we randomly per- 
muted aberration and expression data 1000 times and 
estimated the empirical distribution of the number of 
significant pairwise associations arising from permuted 
data. The expected number and the 99 percentile num- 
ber of false positives were calculated from this empirical 
distribution. The number of positive calls from the data 
was the number of significant pairwise associations. 
Two types of false discovery rates were evaluated 
accordingly: 

expected # false positives according to the null model # false positives in the 99 percentile of the null model 
#positive calls from the date *~ # positive calls from the date 

Enrichment analysis of putative targets on passenger genes 

19 transcription factors appear in both drivers/regulators 
of the association modules and the TRANSFAC data- 
base [37]. We extracted their binding motifs from 
TRANSFAC and 5kb promoter sequences of 27748 
human genes from the UCSC Genome Browser [55]. 
For each transcription factor, the occurrences of its 
binding motif on all promoters and on the passenger 



promoters were counted. A standard Fishers exact test 
was applied to evaluate the significance of motif enrich- 
ment on passenger genes. Motif search and Fisher's 
exact test were implemented by our own C programs. 

Two association modules of mRNA expressions con- 
tain microRNA expressions as drivers. We extracted the 
putative targets of the driver microRNAs from the 
union of three databases: TargetScan [38], microRNA. 
org [39], and miRBase [40]. Enrichment analysis was 
carried out on the passenger genes of these modules. 
Co-citation analysis on PubMed database 
We incurred a batch search on the PubMed database to 
find all the pairs of drivers/regulators and passengers 
that were co-cited in the same publications. The spur- 
ious results from the automated search were removed 
by human inspection. Manual curation also identified 
the pairs conferring regulatory or association relations. 
To assess the confidence of co-citation outcomes, for 
each module we replaced passengers with random genes 
or microRNAs and counted co-cited pairs. The maxi- 
mum numbers of co-cited pairs over 10 random trials 
are reported in Table 4. 

Enrichment analysis of functional categories and pathways 

We extracted 4822 functional categories from the Gene 
Ontology database [41] and 889 pathways from three 
pathway databases: Reactome [43], BioCarta [44], and 
the NCI Pathway Interaction Database [42]. For each 
association module, we applied standard a Fisher's exact 
test to identify enriched GO categories and pathways for 
the passenger genes. 

Extraction of tissue-specific patterns 

We extracted tissue-specific patterns of association 
modules with the following procedures. First, we 
obtained the tissue-specific pattern for each mRNA 
expression profile. An expression profile was written as 
a linear combination of "ideal" tissue-specific expression 
profiles, and the up/down regulation on a tissue type 
was determined by its mixture coefficient. An ideal tis- 
sue-specific expression profile has Is on samples belong- 
ing to the target tissue and 0s on all other samples. 
Second, for each association module we checked 
whether its passengers were enriched with tissue-specific 
genes of each tissue type. Third, we employed gene set 
enrichment analysis [21] to find tissue-specific patterns 
of drivers. Finally we reported the intersection of tissue- 
specific patterns of drivers and passengers. Detailed pro- 
cedures are elaborated in Additional file 1, Text SI. 

Additional material 



Additional file 1: Text S1 includes procedures of data processing, 
model selection, experimental and in-silico validations, heatmap 
visualizations of molecular aberrations and mRNA/microRNA 
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expressions of association modules, and distributions of correlation 
coefficients between segment CNVs and their constituent genes 
and microRNAs. 

Additional file 2: Table SI reports the association modules of 
mRNA expressions. 

Additional file 3: Table S2 reports the association modules of 
microRNA expressions. 

Additional file 4: Table S3 reports the co-cited driver/regulator- 
passenger pairs and the PubMed IDs of the citations in association 
modules. 

Additional file 5: Table S4 reports the enriched GO categories and 
pathways for each association module. 

Additional file 6: Table S5 reports the information of partitioned 
segments. 

Additional file 7: Table S6 reports the expression responses of 
putative targets and control genes in c-myb siRNA experiments. 

Additional file 8: Table S7 reports the primer sequences for RT-PCR 
in c-myb siRNA experiments. 

Additional file 9: Table S8 reports the enrichment of driver/ 
regulator binding motifs on passenger promoters with multiple 
motif occurrences. 

Additional file 10: Table S9 reports the association outcomes 
between candidate drivers and GO categories/pathways. 
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